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Introduction 



The main aim of Approximation Theory is to reconstruct a given function 
defined on a set i7 C M" from some values sampled on a finite set X C 
This process is required to be convergent and stable, namely to be such that 
under suitable conditions the approximant is able to reproduce the original 
function with respect to a given norm. 

In this setting, the so-called Kernel methods are of growing importance. Al- 
tough they are built to be well-posed for every data distribution, it is also 
well-known that in many cases they suffers from serious instability if no at- 
tention is paid to some aspects of their use. 

Several approaches have been studied to assure a fast convergence together 
with a stable computation. They are based on different aspects of the ap- 
proximation process, from the optimization of certain parameters and the 
research of convenient data sets, to the numerical stabilization of the under- 
lying linear system. 

Recently the two papers 0,11] introduced a new tool, that is a general way to 
produce a stable basis for the functional space ^/^(fi) where the approxima- 
tion takes place, based only on a particular factorization of the collocation 
matrix associated to the kernel. In the original works this process was used 
to create a Newton basis, which is in particular stable, complete, orthonormal 
and recursively computable. 

We use these results to build a different kind of stable basis that shares 
some useful properties with the one proposed by the authors. Moreover, our 
basis provides a connection with a "natural" basis for the functional space 
J\fq,{Q), which arises from an eigendecomposition of a compact integral oper- 
ator associated with the kernel, and which brings intrinsic information about 
the kernel itself and about the set fl. 

On the numerical side, the structure of the basis allows to further stabilize 
the approximation by moving from an exact data interpolation to an approx- 
imation in the least-squares sense, with a process that exactly corresponds 
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to a low-rank approximation of the kernel matrix. 

The work is structured as follows: Chapter [T] gives a short introduction 
to the theory of Radial Basis Function; Chapter [2] describes the particular 
procedure introduced in the papers [H H] that will be the starting point for 
the development of our basis, which is constructed and analyzed in Chapter 
[31 Chapter m presents some numerical examples which test our method from 
different point of view. The last Chapter discusses potential work that could 
be done to improve and better understand our results. 



Chapter 1 

Radial Basis Functions 



In this chapter we will give a brief introduction to the general theory of Ra- 
dial Basis Functions (RBF). 

We will start from some remarks on multivariate approximation that moti- 
vates the introduction of this kind of technique, then we will recall the basic 
theoretical results that arises in this context, focusing mainly on the tools 
that will be used in the following chapters. 

The last section summarizes the main convergence and stability estimates, 
with particular focus on possible weakness of RBF approximation. 

Each result presented here is taken from the book [7], that contains a com- 
plete coverage of the theory of RBF as well as a much more general treatment 
of the Scattered Data Approximation. 

1.1 Motivation 

The goal is to reconstruct a function / G C{Q) defined on a set f2 G 
from its samples at a fixed, discrete data-sites set X G Q, of cardinality 
\X\ = N e N. 

The idea is to fix a finite dimensional subset V C C{Q) that allows a suffi- 
ciently good approximation of the full space and to use the data-values 
to represent the function in this subspace. 

To put this in practice, if V is spanned by a basis {vi, . . . ,vn}, one wants to 
find an interpolant P[f] E V such that 

N 

i=i 
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where the coefficient vector [ci, . . . , cat]^ is clearly the solution of the linear 
system 

■ Cj = f{xi) i,j = 1,...,N 

The matter is now to choose a good subset V, i.e. a subset for which the 
problem is well-posed. These space are the so-called Haar spaces: 

Definition 1.1 (Haar space). Suppose that Q C M"^ contains at least N 
points. Let V C C{Q) be an N - dimensional linear space of functions, and let 
{vi, . . . ,vn} be a basis for V. Then V is called a Haar space of dimension 
N ouQif 

dei{vj{xi)) ^ 
for any distinct points xi, . . . ,xisr G Jl. 

This is, for example, the case of the space 11^ of the univariate polynomial of 
degree at most d, that is indeed widely used and studied in approximation 
theory. 

Unfortunately if the dimension n of the underlying real space is bigger than 

1, there is no hope to find an Haar space: 

Theorem 1.1 (Haar - Mairhuber - Curtis). Suppose that C R", n ^ 

2, contains an interior point. Then there exists no Haar spaces on fl of 
dimension N ^ 2. 

The last Theorem forces to use, in the multivariate setting, spaces and bases 
that depend on the choosen points. 

In this context takes place the RBF approximation. Indeed, given a ker- 
nel 

we build a data-sites dependent basis 

rx^{H;Xi),...,^;XM)} (1.1) 

that can be used to approximate a function / as described. 
In particular we will solve the linear system 

{^j{Xi,Xj))^^.-Cj = f{Xi). 

The problem is well-posed if the so-called kernel matrix A :— (^j(xi,Xj)).j 
is invertible for every possible choice of the data-sites X. 
The way to ensure this is to require that the kernel $ is strictly positive 
definite: as shown by the following definition, a positive definite kernel leads 
to a positive definite and hence invertible kernel matrix: 
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Definition 1.2. A continuous function $ : x R" — > R called pos- 
itive definite if, for all N E N, for all sets of pairwise distinct centers 
X = {xi, . . . , a;Ar} G R" and for all a G \ {0}, the quadratic form 



is positive. 

Even though this hypotesis on $ suffices to have an unique solution for 
the approximation problem, a further requirement gives some advantages: 
it's common to assume that $ is radial, in the sense specified by the next 
definition. 

Definition 1.3. A function $ : R" x R" — )■ R zs said to be radial if there 
exist a function : [0, oo) — >■ R such that y) = (f){\\x — y\\2)- 

This property allows to use the same structure for every dimension n G N 
(dimension-blindness), and moreover assures that the kernel is symmetric, 
together with the kernel matrix A. Nevertheless, there are some kernels that 
are positive definite only for some dimension n of the space. 

In [71 Ch.6] is also possible to find a complete characterization of this kind 
of functions, based on generalized Fourier transforms. 

Some relevant examples of radial and positive definite kernel functions that 
will be used in this work are listed in the table fll.ll) . It's common to use a 
radially scaled version 



of the kernel, in order to have more control on the behaviour of the approx- 
imant. The parameter £ G R is referred as shape parameter. 

1.2 Native Space 

There is a natural space in which consider the RBF approximation. In fact, 
for each positive definite and symmetric kernel $ and for each region f2 C R" 
it is possible to define an associated real Hilbert space, the so-called Native 
Space J\f^{Q). In this space $ is a reproducing kernel, in the sense of the 
following definition: 

Definition 1.4 (RK). Let H be a real Hilbert space of functions / : f2 — )■ R. 
A function $ : f2 x f2 — R zs called a reproducing kernel for "H if 



N 



N 




j=l i=l 



^{x,y) = (f){e\\x - y\\2) 
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0(r) 


dim 


Gaussian 




n 


Inverse Multiquadric (IMQ) 


1/^1 + {erf 


n 


Generalized IMQ 


1/(1 + {erff 


n 


Inverse quadratic (IQ) 


1/(1 + er) 


n 


Linear Matern (IMAT) 


e-^'^(l + er) 


n 


Quadratic Matern (2MAT) 


e-=''(3 + 3£r+ (£r)2) 


n 


Cubic Matern (3MAT) 


e'''\lb + Iher + Q{erf + {erf)) 


n 


Linear Laguerre-Gaussian 


e-(-0^(2- (£r)2)) 


2 


Quadratic Laguerre-Gaussian 


e-(^-)^(3-3(£r)2) + i(£r)4) 


2 


Linear generalized IMQ 


(2 - {er)-)/{l + {erf))^ 


2 


Wendland 2, (W20) 


(1 - er)l 


2 


Wendland 2, 1 (W21) 


(1 -£r)^(3£r + l)/20 


2 



Table 1.1: Examples of radial positive definite kernels. The table shows the 
functions : M ^ M such that $(•, •) := 0(|| • — • II2) is positive definite in 
R", where n is as in the third column. 
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1. $(•,?/) en yyen 

^- f{y) = {.fi^{.'iy))'H V/ G "H, Vx G f2 (reproducing property) 

It's well known that the existence of a positive definite reproducing kernel 
for an Hilbert space H is equivalent to the continuity and the linear inde- 
pendence of the point evaluation functionals 5x G H* . 

The construction of such a space works as follows. Define the linear space 

7V$(Q) :=span{<l>(-,|/) :yG (1.2) 
and equip it with the bilinear form 

(N M \ N M 

J]a,$(-,a;,),^A*(-,a;,) J := ^ ^ a,A*(a:i, a;^). 
j=l i=l / ^ j=l i=l 

Then the space {N<s,{fl), (•,•)#) is almost what we want: 

Theorem 1.2. Under the above assumptions on the kernel, (-, ■)$ defines an 
inner product on N^{fl), which is a pre-Hilbert space with reproducing kernel 

The completion of A^$(n) with respect to the || ■ ||<i,-norm is the Native Space 
A/$(f)), which is an Hilbert space with $ as reproducing kernel. 
This space clearly contains elements / not in A^$(f2), that can be understood 
as functions of the form 

/(a;) = (/,$(-, x))$ Wxen 

since the continuity of 5x is preserved in the completion. 

This space is unique in the sense that if ^ is a real Hilbert space of functions 
/ : Q — > R with reproducing kernel then Q — J\f^{Q) and the inner prod- 
ucts are the same. 

Moreover it can be shown that if an Hilbert space Q has a reproducing ker- 
nel, it's necessarily positive definite and symmetric, and then there is a full 
equivalence between Reproducing Kernel Hilbert Spaces and Native Spaces 
for positive definite and symmetric kernels. 

1.2.1 Embeddings 

For their use in approximation, it is useful to know what kind of functions 
belongs to the Native Space and how the smoothness of the kernel is inherited: 
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Theorem 1.3 {C^{Vl)). Let Q C M" be an open set and let $ G C^^iQ x Q) be 
a symmetric, positive definite and radial kernel on Q. Then A/$(r2) C C^(r2) 
and Va G N", |a| ^ A;, V/ G and Wx e n 

D-/(x) = (/,D-$(-,x))<, 

The case k=0 in particular ensures that the || ■ ||$-convergence implies the 
pointwise convergence (it's also a direct consequence of the reproducng kernel 
property). 

Theorem 1.4 (L2(f2)). Let Q C be a compact set and let ^ be a symmet- 
ric, positive definite and radial kernel on Q. Then the Native Space A/'$(fi) 
has a continuous linear embedding into L2{Q), and in particular 

where \Q\ := meas{Q) 

1.2.2 An integral operator and a "natural" basis 

It is possible to define an integral operator T$ associated to the kernel and 
defined on the Native Space. 

Construction and properties of such operator are discussed in detail in the 
paragraph 10.4 of the book [7], where it is the base for a further character- 
ization of A/$(fi). Here we are interested mainly in a particular basis that 
arises from an eigendecomposition of it, which would be the key tool for the 
subsequent discussion. 

Consider the operator T<j, : L2{^1) — ?■ L2{^1) defined by 

TMi^) ■■= [ ^x,y)f{y)dy V/ G L2{n), Wx e Q (1.3) 
Jn 

that maps L2{fl) continuously into 7V$(r2). It is the adjoint of the embedding 
operator of A/$(i7) into L2{^1), i.e. 

{f,v)L,in) = {f,T^[v]h V/ G A^^(^]), \/v G L2{n). (1.4) 

A particular and in some sense "natural" basis for A/#(i^) comes from the 
following theorem: 

Theorem 1.5 (Mercer). Every continuous positive definite kernel $ on a 
bounded domain f2 C defines an operator 

n:Af^in)^Af^{n), T4f] = [ ^x,y)fiy)dy 

Jn 
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which is bounded, compact and self-adjoint. It has an enumerable set of 
eigenvalues and eigenvectors {^Pj}j>o, 

XjiPj{x)= / ^{x,y)(pj{y)dy Vx G 
Jn 

which forms an orthonormal basis /or A/$(i7), and in particular 

{Vj}j>o is orthonormal in A/$(fi) 
{(/?j}j>o is orthogonal in L2(fi), ||</5j H^^^j^-j = 
\j — > as j ^ oo 

Moreover the kernel has an eigenfunctions expansion 

oo 

which is absolutely and uniformly convergent. 

Remark 1.1. The operator T,^, is a trace-class operator, and in particular 

^Xj= I <!>{x,x) dx = (f){0) \Q\ 

This property, together with the fact that the eigenvalues accumulates in 0, 
will be useful to estimate the convergence of the truncated series to the full 
one. 

Remark 1.2. A consequence of the property (11 ■4p . which we point out for 
later use, is that Vj > 

(/, ^j)L2(n) = (/, = Xj (/, ipj)^ = {(pj, (Pj)L2(n) if, ^j)-s> V/ G A4(^^) 

1.2.3 Other inner products 

For later use we introduce also the following discrete scalar products, which 
will be of key importance in the following chapters. 

Definition 1.5 (). Let f , g be functions in A/$(i7) and X = {xi, . . . ,Xn} C 
fl a discrete set. The -scalar product is defined as 

N 

U,9) = ^f{xi)9{xi) 
1=1 
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Definition 1.6 {£2{X)). Let f, g be functions inJ\f^{Q), X — {xi, . . . ,xn} C 
Q a discrete set and W = {wi, . . . ,wn} C R a set of positive weights. The 
£2 {X) -scalar product is defined as 

N 

{f,9)e^{x) ^^Wi f{xi)g{xi) 

i=l 

It is clear from the definitions that both the products are not positive definite, 
since for each / G A/'$(f2) that vanishes on X, {f,f) = {f,f)i^{x) = 0. 
Anyway they are positive definite when restricted to 

1.3 Error bounds and stability estimates 

We recall that, for a subset 17 C M", a discrete data-sites set X C and a 
radial, positive definite kernel $ G C{Q x Q) the RBF interpolant Px[f ] to a 
function / G A/'$(f2) is computed as 

N 

^x[/](a:) = J]c, Px[f]{xi) = f{xi) yxeQ,x,eX (1.5) 

The question is how well Px[f] can approximate the sampled function /, 
i.e. if Px[f] converges to / in some given norm when the data-sites X be- 
comes dense in Q. To be more precise, one wants to know if J\f^{X) saturates 
A$(f^) for a good choice oi X C and if the process can be accomphshed 
in a stable way. 

There are two quantities used to relate the set X to these requirements: 
the fill distance 

/ixn = max min llx — Xi||2 (1.6) 
' sen Xi^x 

and the separation distance 

Clearly the shape parameter e have also an important role, since it deter- 
mines the radial amplitude of the kernel. 

The first estimate comes directly from the definition of the pointwise-error 
functional. Let be defined Vx G O as 



£,:J\f^{n)^R, £,[f]^f{x)-Px[f]{x) 
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and let V^^x denote its norm, the so-called Power Function. 
Then the basic estimate for the convergence is the following: 

Theorem 1.6. LetQ C R", let^ e C(0xf2) be a symmetric positive definite 
kernel, let X (Z ^ be a discrete set of data sites and let f e Then 

\f{x)-Px[f]{x)\^r^,xmfU ^xen (1.8) 

It can be refined observing that the interpolation operator is a projection 
with respect to the $-inner product (and then the intcrpolant is also the 
best approximation in J\f^{X) of / e A/$(0)). The estimate then becomes 

\f{x)-Px[f]{x)\^r^,x{x)\\f-Px[f]h VxeQ (1.9) 

Moreover, the Power Function can be exactly computed introducing a La- 
grange basis for A$(X): 

Proposition 1.7. For any pairwise distinct data-sites set X E Q there exist 
a Lagrange basis = {£i, . . . ,£n} for M<s>{X). This basis is such that ij{xi) = 
^ij ^hj = I, ■ ■ ■ ,N and then the intcrpolant can be written in cardinal form 
as 

N 

^x[/](x) = ^/(x,)£,(x) VxeQ 
i=i 

The expression for the intcrpolant gives 

N 

V^,x{xf = ^x,x) -^^x,xj)ei{x) (1.10) 

Using this explicit representation it is possible to bound the Power Function 
on domains that satisfies an interior cone condition, using a multivariate 
Taylor expansion. This final estimate relates the set X to the interpolation 
error: 

Theorem 1.8. Let Q C M" be a bounded set that satisfies an interior cone 
condition and let $ G C'^^{fl x fl) be a symmetric positive definite kernel. 
Then there exist positive constants Hq and C independent of x, f and 
such that \/X C Q, hx,n ^ hg, V/ e A$(il) and \/x & ft 

\f{x) - Px[f]{x)\ ^ C ■ h\^^ ■ C^Wf - PxifWU 

where C$ is a constant that depends on the derivatives of^. 
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Prom the previous estimate we can expect that the approximation error goes 
to zero as hx,n ^ 0. This is not completely true, since when the data-sites 
set X becomes too big the interpolation can be instable. 
In fact, it is possible to prove that the condition number of the kernel matrix 
A grows if the separation distance qx decreases, and this, together with a bad 
choice of the shape parameter e, can produce very instable approximants. 
Various approaches arc used to avoid this situation. A lot of efforts are made 
on the study of well-distributed data-sites set, for examples sets X such that 
the uniformity 

Qx 

PX,U = 7 

ii'X,n 

is maximized. 

Another common way to try to avoid instability, and more related on the 
liner algebra part of the method, is to choose a shape parameter e such that 
the kernel matrix is not ill-conditioned. 

Recently another method to ensure convergence and stability was presented, 
and it is described in the next chapter. 



Chapter 2 
General bases 



As shown in the previous chapter, the use of the standard basis of translates 
leads to ill-conditioned kernel matrices, and, moreover, it gives poor infor- 
mation about the selection of a "good" centers set X G Q. 
Hence it makes sense to consider different bases of A/#(fi) in order to obtain 
better results in term of stability and convergence, and such that it will be 
possible to describe in an useful way the data dependence of the subspace 



In the present chapter we will describe this kind of change of basis, focusing 
mainly on $-orthonormal basis. 

This approach was introduced in the papers [3l H], where it is the starting 
point to produce a Newton basis. 

2.1 Definition and characterization 

Let Q CW^, X = {xi, . . . ,Xn} C Q, and let Tx = {$(•, Xi),Xi G X} be the 
standard basis of translates. 

Consider another basis U = {ui G A/'<i>(fi), i = 1, . . . , N} such that 



The following Theorem gives a characterization of such bases, where Tx and 
U are expressed as row vectors: 



Af^{X). 



span{U) = span(7x) = A/$(X) 



(2.1) 



T(x) 
U{x) 



[$(x,xi), . . . , ^{x,xn)] G R- 
K(a;),...,MAr(x)] gM^ 



N 
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Theorem 2.1 (Characterization). Any basis U arises from a factorization 
of the kernel matrix 

A = Vu- Cu-' (2.2) 

where 

Vu = {uj{xi))i^i,j^N (2.3) 
and the coefficient matrix Cu is such that 

U{x) = T{x) ■ Cu (2.4) 

Proof: Let U he a basis as defined in f l2.ip . Since Ui G A/#(X) \/ Ui & U, 
there exist real coefficients {cji)i^ij<^N such that Wx E Q 

N 

Ui{x) = ^ $(x, Xj) Cji , 1 ^ i ^ N (2.5) 
i=i 

and it suffices to set Cu = (cji)i^ij^iv to obtain (12. 4p . 
Now consider the evaluation operator Ex '■ A/'$(fi) — )■ M^, 

Ex{f) = [f{x,),...J{x^)] 

that maps functions into column vectors and rows of functions into matrices. 
By definition 

E{T{x)) = {^{xi,Xj))i^ij^N = A 

E{U{x)) = iUj{x,i))i^ij^N- 

Setting Vu := E{U{x)) as in (ES]), by ([23D we have 

Vu = E{U{x)) = E{T{x) .Cu) = A-Cu 
and hence A = Vu ■ Cu^^. □ 

Using this characterization is simple to describe useful properties of the basis 
U. In particular we will be interested in bases that are orthogonal or even 
orthonormal with respect to the inner products defined in section fll.2p . The 
gramians can be computed as follows: 

Proposition 2.2 (gramians). Given a basis lA as in we have 

Gu '■= {{Ui, Uj)q,)i^ij^N = Cu^ ■ A - Cu 

'■= {{ui-,Uj))i^ij<^N = Cu^ ■ A? ■ Cu 

'■= ii^i, Uj)q(x))l^i,j^N = Vu^ -W -Vu 
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Proof: The formula for the $-gramian Gu comes from (12. 5p : 

Gu = iiUi,Uj)^)i<^ij^N = I CkiChj^{Xh,Xk) j 

\h,k=l J 



Cj -A-Q 



The and £^(X)-gramians Tu and can be directly computed using fl2.2p 
and fl^ : 

(N ^ 
^Ui{Xh)Uj{Xh) 



Vu" ■ Vu = Cj -A^-Cu 



N 



This concludes the proof. □ 

For later use it is convenient to rewrite also the Lagrange basis (II. 7p and the 
Power Function (ll.lOp using the same notation: 

Proposition 2.3 (Lagrange basis). The Lagrange basis for Af<s,{X) is de- 
scribed by the matrices 

V=I, C = A-\ G = A-\ F = / 

Proof: The statement is a direct consequence of Theorem (12. ip and Propo- 
sition (12. 2p applied to the definition (II. 7p of . □ 



Proposition 2.4 (Power function). The Power Function can be expressed 
as 

V^,x{x) = ^x,x)-U{x)-Gu'^ -U^ix) (2.6) 
= 0(0) - U{x) ■ Gu'^ ■ U^{x) yxen (2.7) 

Proof: The last Proposition gives 

L{x) := [iiix), . . . , iNix)] = T{x) ■ G = T{x) ■ 
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Now starting from the definition (ll.lOp we get 

N 

i=l 

= <l>{x,x) -T{x) ■ L^{x) 

= ^{x,x) -T{x) ■ A-^ -T^ix) 

= x) - U{x) ■ Cu-' ■ ■ {Cu-T ■ U'^ix) 

= x) - U{x) ■ Gu'^ ■ U^{x) 

= 0(0) - U{x) ■ Gu-^ ■ U'^ix) 

where we used the definitions of U{x) and G^. □ 



2.2 Interpolation and stability 

Now it is possible to express the interpolant to a given function / G Af<!,{Q) 
using this notation. 

Proposition 2.5 (hiterpolation) . The interpolant Px[f] to a function f G 
A/$(f2) on X C fl can be rewritten as 

N 

Px[f]{x) = J2Mf)M^) = U{x)-Au{f) Wxen (2.8) 
i=i 

where Au{f) = [Ai{f), . . . , Ai^{f)]'^ G is a column vector of values of 
linear functional defined by 

Au{f) = Gu-' ■ A-^ ■ Exif) = Vu''Ex{f) (2.9) 
Proof: The interpolant (11. 5p is espressed by 

N 

Px[f]{x) = Y, a^^{x,x,)=T{x)-a (2.10) 
i=i 



where a G M^, Aa = Ex{f)- Thus, according to (12. ip . 

Px[f]{x) = T{x)-a = U{x)-Gu-'-A-'-Ex{f) (2.11) 
= Uix)-Vu-'Exif) (2.12) 

This proves the statement. □ 
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Proposition 2.6 (Stability). Let p(-) be the spectral radius and let k,2{') be 
the condition number with respect to the euclidean norm \\ ■ on M^. Then 
the stability of the evaluation of Px [f] can be bounded \fx E Q as 

\Px[f]{x)\' ^ \\U{x)\\l \\Au{f)\\l < K,{Gu) 0(0) ll/lll (2.13) 
and in particular the following bounds hold: 

\\U{x)\\l ^ p{Gu)m WxeQ (2.14) 
\\Mf)\\l ^ P{Gu') ll/lll ^feU^iSl) (2.15) 

Proof: By the previous Proposition and the Holder inequality we get the 
bound 

\Px[f]{x)\^\\U{x)\\, UuU)\\2 Va:Gl^ (2.16) 

The term ||f/(a;)||2 can be bounded using (12. 6p : the Power Function is non 
negative and hence U{x) ■ Gu^^ ■ U^{x) ^ ^{x^x) = 0(0). Now, using the 
properties of the Rayleigh quotient for the symmetric matrix Gy^, we get the 
desired bound. 

Now consider the term ||A2^(/)||2: from (12. 8p it follows that 

N 

\\Px[f]h = $^A,(/) A.(/) (w,(x),w,(x))$ (2.17) 

= Al,-Gu-Au (2.18) 

and then the bound holds thanks to ||-Px[/]||$ ^ II /lU^ applying the 
same eigenvalue manipulation as above. □ 

Corollary 2.7 ($-orthonormal bases). If U is a ^ - orthonormal basis, the 
stability estimate becomes 

\Px[f]{x)\ ^ Wfh Va:Gl] (2.19) 

In particular, the value of \\U{x)\\2 for fixed x E Q and the value of ||A(/)||2 
for fixed f G A/'$(f2) are the same for all ^-orthonormal basis, and the fol- 
lowing bounds, that are not dependent on X (Z Q, hold G 

\\U{x)h^^l l|A(/)|b^||/|k (2.20) 

Proof: The bounds follow immediately from the previous Proposition in the 
case Gu = I- 
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The equahty Gu = I and the Power Function equation (ll.lOp gives also, 
Va; e n, 

\\U{x)\\2 = U{x) ■ U{xf = ^{x,x) -r^,x{x) ^ ^{x,x) (2.21) 

and since V<i,^x{x) is independent of the particular basis chosen, so is the 
value ||f/(x)||2. 

Finally (I2.17P proves that || A(/) ||2 is independent on U if it is $-orthonormal. 
□ 



2.3 Orthonormal bases 

The previous Corollary suggests the opportunity of using orthonormal bases. 
The following Theorems give a complete characterization of bases U as de- 
fined in (12. ip that are orthonormal with respect to the considered inner prod- 
ucts. 

Theorem 2.8 ($-orthonormal bases). Each ^-orthonormal basis U arises 
from a decomposition 

A = ■ B, (2.22) 

with Vu = B^ and Cu = B'^ 

Proof: Since Gu = Cu^ ■ A ■ Cu hj Theorem (Q, the condition Gu = I 
is equivalent to A = {Cu^)~^ ■ Cu^^, and the statement holds according to 
th.diH). □ 

It is also useful for the next chapter to characterize bases that are orthonor- 
mal with respect to the discrete scalar product introduced in chapter [T] 

Theorem 2.9 (-orthonormal bases) . Each -orthonormal basis U arises from 
a decomposition 

A = Q-B, Q'^ ■Q = I (2.23) 

with Vu = Q and Cu = B. 



Proof: Since Tu = Cu^ ■ A^ ■ Cu by Theorem (Q, the condition Vu = I 
implies that A ■ Cu = is orthogonal. □ 
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Remark 2.1. // the basis U is ^-orthonormal, the junctionals Aj in (12. 5 p 
are obviously (^-scalar products, and so the interpolant takes the form 

N 

Px[f]{x) = Y,if^^MA^) (2-24) 

indeed 

N 

A,(/) = {Vu-'■Ex{f))J = {Cu^■Ex{f))j = J2^vfi^^) 

i=l 

\i=l / $ 

The above fact is clear since the interpolation operator is a projection operator 
on A/$(X) with respect to (■, ■)#, andU is a ^-orthonormal basis for M<i,{X) . 



Chapter 3 



Weighted SVD bases 



This chapter deals with the main issue of the present work. 
We will introduce a particular basis for the native space A/$(fi), and we will 
present the reason to threat it. In particular the discussion points out the sta- 
bility and convergence rate of the interpolant and the weighted least-squares 
approximant based on it. 

The main idea is to discretize the "natural" bases described in Theorem 11.51 
The numerical approximation of such basis gives a point-dependent, discrete 
basis which can be described using the notations introduced in the previous 
chapter. 

The interest on this basis is that it preserves the properties of the continuous 
basis in a discrete setting, and in particual is a complete, $-orthonormal and 
£^(X)-orthogonal basis for A/'#(fi). 

The idea for using this approach comes again from the paper [4J , where it is 
suggested as a possible way to create a connection between the continuous 
basis and the discrete one. 

3.1 Definition and basic properties 

Consider a cubature rule (X, yV)^, X G N, on Q, i.e. a set of points 
X = {xj}^^i C Q and a set of positive weights W = {wj}^^^ such that 



We can approximate the operator (II. 5p for each eigenvalue Xj using the Nys- 
trom method based on the above cubature rule. A complete description of 
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the method is present in [Ij. 

The operator can be evaluated on X, 

Xjifijixi) = / y)^j{y)dy Vi = 1, . . . , iV, Vj > 
Jn 

and then discretized using the cubature rule as 

N 

\j(fj{xi) = ^^{xi,Xh)'^jixh)wh = 1,...,N (3.2) 

h=l 

Now, setting W = diag{wj), it suffices to solve the following discrete eigen- 
value problem in order to find the desired approximation of the operator's 
eigenvalues and eigenfunctions (evaluated on X): 

Xv = {A-W)v. 

However this approach does not lead directly to a connection between the 
discretized version of the basis of Theorem 11.51 and a basis of the subspace 
A/'$(X) as defined in equation fll.2p . since it involves a scaled version A-W of 
the kernel matrix which is no more symmetric and that cannot be described 
as a factorization of A, as required by the construction made in the previous 
chapter. 

A solution is to rewrite fl3.2p using the positivity of the weights as 

N 

\j{^/wlipj{xi)) = ^(v/^^$(xi,x?,)v/?^)(v/^V5j(a;h)) V2,j = 1,...,A^ 

h=l 

(3.3) 

and then to consider the corresponding eigenvalue problem 

A (yW ■ = (^VW ■ A ■ VW^ (^VW ■ 

which is equivalent to the previous one, now involving the symmetric and 
positive definite matrix A^r := y/W ■ A ■ \/W. 

In particular this matrix is normal, and then a singular value decomposition 
of it is also an unitary diagonalization. 

Motivated by this approach we can introduce a new basis for A/$(X), de- 
scribed in terms of the notation given in Theorem 12.11 
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Definition 3.1 (Weighted Svd basis). A weighted svd basis U is a basis for 
A/'$(X) characterized by the following matrices: 

Vu = VW^ -Q-^, Cu = Vw -Q-^-' 

where 

\fW ■ A-^ = Q-Y? -Q^ 

is a singular value decomposition (and an unitary diagonalization) of the 
scaled kernel matrix Aw, W = diag{wj), where {wj}^^^ are the weigths of a 
cubature rule {X, W)n- 

Remark 3.1. For what follows it is important to require that XljLi "^i = 1^1; 
which is equivalent to ask that the cubature rule {X, W)n is exact at least 
for the constant functions, VA^ G N. In the next this property will be assumed 
to hold. 

3.1.1 Preservation of the properties of the continuous 
basis 

As expected, this basis preserves, in a discretized sense, some interesting 
properties of the "natural" one: 

Proposition 3.1. Each weighted svd basis lA has the following properties: 

1. Uj{x) = j,Y.tiWiUj{xi)^x,Xi) ^ ^T^[uj\{x), V 1 ^ J ^ iV, Vx G 

i j 

2. U is ^-orthonormal 

3. U is i^{X)- orthogonal 

4- ll^illf-(x) = '^'^j ^ ^ 

Proof: Properties 2-3-4 can be proved using the gramians as in Prop. 

Gu = Cu^ ■ A- Cu = Cu^ - Vu = ^-' ■ ■ VW ■ VW^ ■ Q -J: = I 
Tu = Vu^ -W ■Vu = ^-Q^ ■ \/W-^ ■ W ■ VW^ • Q ■ S = 



26 



Chapter 3: Weighted SVD bases 



To prove the first one it sufiices to use the definition of Cu and Vu'- indeed 
from the definition of V^, if we denote the j-th column of Vu as Yuj, we get 

^ Ex{uj) = Vuj = VW qjaj 
and the last equality allows to compute each component of Cu as 

and then the Definition of U gives 

N N 

'"j(^) = X] ^i^^ ^i) {Cu)i,j = ^{X, Xi) Uj{Xi) = 
i=l i=l ^3 

1 ^ 
'^j i=l 

where the last term is cleary the approximation of ^^[Mj] given by the rule 
(X, W)jv, divided by the corresponding discrete eigenvalue 
Property 5 is due to a linear-algebra relation: we recall that 

Vw ■ A ■ Vw = g • • 

and since the trace of a square matrix is equal to the sum of its eigenvalues, 
we get 

N N N 

j=i i=i j=i 
This concludes the proof. □ 



Remark 3.2. In this context, where {wj}jLi are cuhature weights, the £3 (^)- 
scalar product is a discretization af the L2{^) -scalar product. Indeed 

if,9)h{n) = / f{x)g{x)dx ^^Wjf{xj)g{xj) = {f,g)j^^x) (3-4) 

and in this sense the property 3 is a discretized version of the corresponding 
property of the continuous basis. 
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In the same way, property 5 states that 

N N „ 

i=i i=i 

and exactly the same relation holds for the continuos eigenvalues if N ^ oo, 
as pointed out in Remark (II. ip . In this case the integral is exactly approxi- 
mated by the cubature rule since it is supposed to be exact at least for constant 
functions. 

3.1.2 Completeness 

The basis enjoys another important property: if the data set X if sufficiently 
big, the basis is complete in the native space. Although at the moment it is 
quite useless in a practical sense, it will be a key property when combined 
with the stability considerations which follows. 

Remark 3.3. A function f E A/$(i7) belongs to A/$(X)"'" if and only if it 
vanishes on the discrete set X G Q, i.e. 

^^{X)^ = {/ G A4(^]) : fix,) = Vx, e X} 

Indeed if f E A/$(X)"'", i.e. {f,Uj)^ = Vwj G U, then using Remark (12. ip 

N 

Px[fKx) = Y,{f,u,)^ u,{x)^0 
i=i 

andf = Px[f] = onX. 

On the other hand, if f = on X, then Px[f] = Q on X, and since U is a 
orthonormal and hence linearly independent set, we have {f,Uj)q> = Vwj G 
U. 

Proposition 3.2 (Completeness). If the data set X is dense in Q, the basis 
U is a complete basis for N'<s>{VL) . 

Proof: To prove the completeness it suffices to show that if a function 
/ G A$(f2) is orthogonal to each basis element Uj G W, it is the null function. 
This fact follows immediatly from the previous Proposition, the denseness of 
X mVt and the embedding M<s>{VL) ^C{VL). □ 
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3.2 Interpolation and approximation 

In this section we will describe the interpolation and approximation process 
based on the weighted svd basis. This allows to get error bounds and stability 
estimates that explains the use of such basis. 

3.2.1 Interpolation 

We recall some basic facts about the interpolation process by kernels. Most 
of all are the same for each $-orthonormal basis, as shown in the previous 
chapter. 

Proposition 3.3 (Interpolant). Let f G A/'$(fi) and X C ^l, X = {xj}jLi. 
Then the interpolant Px[f] of f on X based on the basis lA can be expressed 
as 

N 

Px[f]{x) = Y,U.u,)^u,{x) WxeQ (3.5) 
i=i 

Proposition 3.4 (Power function). The Power Function V<s>^x{x) takes the 
form 

N 

P$,x(x)' = 0(O)-5^^i,(x)2 (3.6) 

i=i 

Proof: The result can be proved using the matrix form of the Power Function 
given in (12. 4p for the special case Gu = I, i.e. for an orthonormal basis, but 
it is more clear to give a direct proof. 

By definition, the Power Function is the norm of the error functional 

: Af^{Q) M 

that maps every function / G A/$(r2) to the interpolation error at a given 
point X & Q, i.e. 

£,[f]ix) = fix)-Px[f]ix). 

Using the reproducing property of the kernel and the Proposition (13. 3p . we 
can rewrite Sx as 

N 

^x[f]{x) = (/,$(-,x))$ - (/,^Mj(>j(x))$ 

i=i 

TV 

i=i 
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and since the norm of equals the norm of its Riesz representer, we can 
conclude that 

N 

V^^xf = ||$(-,x)-5^K,(-K(x)||| 

N N 
= ||$(-,X)|||+ ^ Ui{x)Uj{x){Ui,Uj)^ - 2^Uj{x){^{-,x),Uj)^ 
i,j=l j=l 

N 

This concludes the proof. □ 

Remark 3.4. The proof gives also an expansion of the kernel when restricted 
to act on functions inJ\f^{X), 

N 

Indeed V/ e J\f^{X) we have f = Uj)^Uj{x), hence Vx e O 

N N 

•))$ = f{x) = = {f,J2uji^hji-)h 

j=i j=i 

3.2.2 Weighted least-squares approximation 

Another common approach to reconstruct a function from its value at a dis- 
crete set X C is to approximate it in the least-squares sense. 
The idea is to use the same sampled data used for the interpolation process, 
but to relax the interpolation condition and then to project the function into 
a subspace of A/$(X), rather than into the full subspace. 
In this way it is possible to reduce the computational cost of the process, 
since a smaller basis W is involved, and to obtain better results in terms 
of stability. The hope is to gain these benefits without a serious loss of con- 
vergence speed. 

This kind of approximation is also meaningful when the data values are sup- 
posed to be affected by noise, and then an exact recostruction of them makes 
no sense. 

In this setting, moreover, the properties of the weighted svd basis provides 
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an additional reason to consider this kind of approximation: since the eigen- 
values of the operator T$ decays very rapidly to zero, and the discrete one 
which approximates them are the discrete norms of the basis involved, it is 
reasonable to consider only the most significant of them. 

In fact, we are interested in a weighted least-squares approximation defined 
as follows: 

Definition 3.2. Given a function f G A/$(fi), a discrete subset X G Q, 
a set of cubature weights W associated with X, a weighted svd basis U for 
M^{X) and a natural number M ^ N = \X\, the weighted least-squares 
approximation of order M of f is the function Am [/] that satisfies the codition 

Aa/[/] = argmin \\f - g\\q{x) 

g(l^span{ui,...,Uf,f} 

The reason to use the first basis element is the fact that they are associated 
to the bigger singular values of the scaled kernel matrix A]^, and then they 
allows a more stable and accurate reconstruction of /. This relation will be 
stated exactly later when dealing with error bounds. 

In order to compute the approximant, we start to point out a relation between 
the and the £^(X)- scalar product, that is again a discretized version of 
a property of the continuous base {fj}j>o, stated in Remark (II. 2p : 

Proposition 3.5. For all f G A/$(i7) and for each Uj G U, the following 
relation between the $- and the £2 {X)- scalar products holds: 



1 (/) "^j^g^i^X) 



2J 



Proof: Using property 1 of Proposition f l3.ip . by direct calculations we get 
the statement: 



\ i i=l / $ i i=l 

1 ^ 1 

J j=i J 
where crj = {ujjUj)^^!^-^-^ as shown in the same Proposition. □ 



Now it is possible to express the approximation as a function in A$(X): 
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Proposition 3.6 (Weighted least-squares approximant). In the notation of 
Definition f l3.2p . the approximant is given by 

M f n \ M 

MfK^) = E = Y,{f,uM,{x) Vx G ^] (3.7) 

i=i j=i 

and then Aj\/[/] is nothing else but a truncation of Px[f]- 

Proof: The second term is simply the orthogonal projection of / into the 
space generated by {ui, . . . ,Um}, since the base is orthogonal with respect to 
(■, ■)eif(x) and = ^"jy and it is obviously the element of span{ui, . . . , um} 

that minimizes the ^^(X)-distance from /. 

The third term is derived as a direct implication of the previous Proposition. 
□ 



Remark 3.5. We point out that the previous proposition proves that the 
weighted least-squares approximant Aj\/[/] can be obtained from the inter- 
polant Px[f] simply truncating the last N — M coefficients and basis, the 
ones corresponding to the smallest singular values cr|. This is in opposition 
to the case of the standard basis of translates, in which choosing the bases 
to neglect corresponds to the choice of a restricted subset Y G X where to 
center the kernel, and in general this is a more difficult task. 
Moreover, motivated from the results in the following sections, this procedure 
can be automated when dealing with a situation in which very small singu- 
lar values are expected. In this setting one can leave out the bases which 
correspond to singular values less than a fixed tolerance in order to avoid 
numerical instability, and then skipping automatically from interpolation to 
least-squares approximation. From a linear algebra point of view, this cor- 
responds to solve the weighted linear system associated to the interpolation 
problem using a total least-squares method. 

Using this direct expression for the approximant, it is easy to compute the 
equivalent of the Power Function for Aj;/ [/]: 

Proposition 3.7. The norm of the error functional S^^ for A^j[f] takes the 
form 

M 

i=i 

Proof: The proof is the same of the one for the Power Function, where the 
interpolant is truncated after M terms. □ 
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3.2.3 Stability 



In the case of the interpolant, we recaU the resuh of the previous chapter, 
that holds for each $-ort honor mal basis: 

Proposition 3.8. Since U is a ^-orthonormal basis, it is possible to bound 
the absolute value of the interpolant as 



{3.1 



\Px[f]{x)\^./m\\fU VxGr] 

The result can be refined for the particular case of an svd-basis: 
Proposition 3.9. For an svd bases U, the following stability estimates holds: 



\Px[f]ix)\ ^ WfU, |Am[/](x)| ^ Wfh vx G ^] (3.9) 

where in particular 



\Px[f]{x)\ ^ 



N 



. E^^-(^)' 11/11*' \^M[f]{x)\^ 
\ i=l 



M 



11/11^ VxGfi 
(3.10) 



Proof: It suffices to use the Cauchy-Schwarz inequality for (-, and the 
$-orthonarmality of U: 



\Px[f]ix)\ 



N 



j=l 
N 



N 



l/lh 



N 



The same for Am[/], where the sum stops at M. 
Now the equality (13 .4^ . 

N 

P$,x(a;)' = 0(O)-^M,(x)2 

i=i 

gives exactly the general estimate, since obviously the square of the Power 
Function is non negative. □ 
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3.2.4 Error bounds 

Now we can prove some convergence estimates on the described approxi- 
mants. 

It is important to remark that, in the case of the interpolant, there are no 
difference in the use of a particular basis, since the spanned subspace A/'$(X) 
in which we project a function / G A/$(r2) clearly does not depend on the 
choosen basis. 

On the other hand, the fact that we are using this kind of basis allows us to 
relate the bounds to the continuous eigenvalues and to their eigenfunctions 
{fj}j>o> which forms a complete basis for A/'$(fi) and which are related in 
a close way to the used kernel $. This justifies the choice of sampling the 
function / on a data set X that forms together with a set of weights W a 
good cubature rule. 

Moreover, this remark remains valid in the case of the weighted least-squares 
approximant, where in addition the connection between the discrete and the 
continuos eigenvaues motivates the use of a reduced subspace of A/'$(X). 

The first error bound is a simple adaptation of the estimate (II. 6p for the 
case of a $-orthonormal basis, and then in particular for a weighted svd 
basis: 

Proposition 3.10. Let^l C M", let^ E C(f2xf2) be a radial positive definite 
kernel, and X G Q. Then for each f G Af<i,{Q) and for all x E Q, 



Remark 3.6. ^45 shown in Chapter ([T]), in the above estimate the norm of 
f can be replaced with the norm of f — Px [f] ■ In this case, since the basis is 
^-orthonormal, the following equalities holds: 



11/ - Px[f]\\l = ll/lll - ||Px[/]ll| = ll/lll - J2^f,u,)l (3.12) 



3=1 

The same remains valid also in the next estimates which are derived from 
this one. 

Using this estimate we can compute a bound on the interpolation and ap- 
proximation error using the L2(fi)- norm. It is of interest to estimate the 
reconstruction precision in such norm because it gives not only a punctual, 
"worst-case" bound on the error, but also a global one. 




(3.11) 



N 
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Proposition 3.11. Let Q C be compact, let $ G C{Q x Q) be a radial 
positive definite kernel and X G Q. Then for each f G f/<i,{Q) 



N N 



Proof: From the embedding Theorem fll.4p we know that A/'$(f2) )■ L2(f2), 
hence both sides of fl3.1ip have finite L2(fi)-norm. Thus we can integrate 
over Q the first bound and get 

ii/--p-vi/iiii,,n, < J (^m)-'t.u,{xr^ wfwidx 



Uj{xfdx I ll/lll 



n 

N 



-0(0) '"^"2 



Now we can estimate the L2(f2)-norms as follows: using the simple relations 
ll</5jllL2(fi) = ll"illL(n) + yj - %llL2(f7) + 2 {^j - Uj,Uj)L2(^a) 

and 

hjhiin) ^ WVj -Uj\\L2{n) + ||v^illL2(n) 

we get Vj = 1, . . . , 

-IMhin) = -Ilv^illi2(n) + llv^i -^illiacn) +2 i'^j -Uj,Uj)L,(n) 

< -||^illL2(n) + ll^i - WillL(n) + 2 yj - Uj\\L,(n) \\uj\\L,in) 

< -||^illL2(n) + WVj - WjlUaCn) (3 llv'j - Mj||L2(n) + ||</'il|L2(f7)) 

and from fll.Sp we know that ||<y5|||^(.j^^ = Xj. To conclude it suffices to bound 
in an obvious way the right-hand side term as 



3 yj-Uj\\L2in) + yj\\L2{n) ^ 3 ( max - UjWL^in) ] + y^i =■ 

\^j=l,...,N J 

since the eigenvalues {Aj}j>o are non increasing. □ 

The same estimate remains valid for the approximant Aa/ [/] if iV is replaced 
by M, as a consequence of the Proposition (13. 7p : 
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Proposition 3.12. Let Q C M."- be compact, let $ G C{Q x Q) be a radial 
positive definite kernel, X G Q, \X\ = N E N and M ^ N. Then for each 

(M M \ 

m ■ 0(0) - 5^ A, + C$,n,x,w • E \\n, - v^jhm ||/||| 
j=i j=i J 

We point out that these estimates involve two terms: the first one is 

N 

and it is related only on the kernel, the domain and the dimension G N of 
the approximation subspace A/$(X). From the Remark (11. ip on the Theorem 
(ll.Sp we know that for — )■ oo the above term vanishes, and moreover the 
eigenvalues are positive and orderer in a decreasing way. Hence this term 
measures how the truncated series approximates the full one, or in other 
words how the degenerate kernel 

N 

approximates the original kernel $(x, y). 
The second term is 

N 

C'i>,n,x,w ■ IWj — 'fj\\L2{n) 
i=i 

and it depends also on the cubature rule (X, W)^- It measures the conver- 
gence rate of the Nystom method based on the rule, and gives informations 
on how well the discrete basis U approximates the continuous one. 

Remark 3.7. It can be useful to refer to the cubature error in terms of the 
II ■ Woo-iT'Orm. It can be done in an obvious way since Q is compact and then 

N N 

j=l j=l 

For the weighted least-squares approximant, it make sense to consider also 
another type of error measurement. Indeed in this case the data-sites set 
X C is not used to interpolate the function /, but works as a sample set, 
so the pointwise distance between / and Am[/] on X is not zero. We can 
bound this quantity as shown in the next Proposition: 
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Proposition 3.13. LetQ C M", let ^ G C{QxQ) be a radial positive definite 
kernel, X cQ, \X\ = N, and M < N. Then for each f e A/'$(ri) 

1I/-Aa/[/]||.^(x)< ( aA WfU (3.13) 

\j=M+l ) 

Proof: We start again from the bound in Proposition fl3.10p : in this case the 
finiteness of the £^(X)-norm of both sides is obvious, since every function 
involved in the estimate is continuous on ^2. Acting as in the previous Theo- 
rem we can evaluate both sides in X and sum the obtained values, weighted 
with the weights W. We get 

N / M 

II/-Am[/]|||(x) ^ II/IIIE^^ U(0)-5^n,(a;.)^ 

i=i \ i=i 



^ N M N \ 

0(0) ^ - ^ ^ u;i Uj{x^f\ 

\ 1=1 j=l i=l / 



N M 
I I 0(O)5^Wi-^||Mj-||^«,(^) 
i=l j=l 



(M 



Where we used property 4 of Proposition (13. ip to compute the £3 

of the basis functions and Remark (13. ip to compute the sum of the weights. 

To get the desired bound we recall that 

N 

again as stated in Proposition (13. ip . □ 



Remark 3.8. The last result can be interpreted also in another way. In 
fact, it gives a bound on how the weighted least-squares approximant and the 
interpolant differs on the data-sites set X. Indeed, since f{xi) = Px[f]{xi) 
\/Xi e X, we get V/ e ^4(1]) 

\\Px[f^-^M[f]\\^,ix)^[Y.''^ 11/11* (3-14) 

\j=M+l / 
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Clearly this estimate doesn't give informations on the distance between the 
two approximants on the set Q\X . However this quantity can be computed 
as in the above estimates: 



\Px[f]{x) - AM[f]{x)\ 



N M 

i=i j=i 

N 

{f,Uj)^Uj{x) 

j=M+l 
( 

^ ii/ik E "^•(^)' 

\j=Af+l 

and the partial sum can be bounded as in the previous Theorems. 

Remark 3.9. The trade-off principle between stability and convergence ex- 
plained in Chapter ([1]) can be viewed in this context as follows: we have 

( 

\j=i 

\Px[f]{x) - f{x)\' ^ (^m-Y.uA^y^ 

and the same for Ajv/[/] if N is replaced by M < N . 

Hence for convergent approximant, namely for approximant for which the 
power function converges to zero, we have necessarily 

N 

j2u,{x)'^m 

that is, the constants in the stability bounds in Proposition fl3.9p are maxi- 
mized. 



Chapter 4 

Numerical experiments 



In this chapter we will present some numerical experiments which show the 
actual behaviour of our basis. 

We will point out different features that can be relevant in the choice of 
a method, and in particular the ones concerning stability and convergence 
speed. 

The following tests take place in the setting described in Section (14. ip . 

The code is mainly written in Matlab, using in some parts the software 
present in the book [2]. The most performance-critical parts are written 
in C++, using the MatlabMEX interface j9] and the linear-algebra library 
Eigen jlO]. 

4.1 General setting 

The approximant strictly depends on the set f2 C M", on the kernel $ G 
C{fl X Q), on the data-sites set X (Z fl and on the function / : f2 — )■ M that 
we try to reconstruct. In this section we describe the general choices made 
on this elements for our experiments. 

4.1.1 Approximation domain 

The sets ^] C used are the following: 

• The square Qi = [0, 1] x [0, 1] 

• The disk Q2 with center C = (|, |) and radius R = \ 
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Figure 4.1: The cutted disk and the lens as defined in the Section f l4.1.ip . with 
an example of trigonometric gaussian points, as defined in Section (I4.1.2P 

• The cutted disk Q^, i.e. the unit disk centered in zero with the third 
quadrant cutted away. It is the domain depicted on the left in Figure 
ICT 

• The lens defined as the intersection of two disk with centers C = 
^—^,0^ and c = ^"^,0^ and radii R = r = 1. It is the domain 
depicted on the right in Figure 14.11 



4.1.2 Data sites 

For the standard basis of translates we use two distributions of points: equally 
spaced points and Halton points. The Halton points are used since they are 
well distributed without being equally spaced. 

For our basis we need a data-sites set X C such that (X, W)^ is a 
cubature rule. We use product Gauss-Legendre points for the square Qi, and 
trigonometric gaussian points on the other sets in M^. 

The latter set of cubature points was recently presented in the papers [6l [5], 
and can be obtained for a wide class of domains, much more general than the 
one used here. We use them because they provide a high-accuracy cubature 
rule while being sufficiently uniform in Q. Matlab functions to compute this 
points can be found in the site j8]. 

4.1.3 Test functions 

The functions we try to reconstruct are typical test functions in the context 
of approximation theory: 

• the bivariate Franke function fp 



Chapter 4: Numerical experiments 



41 



• an oscillatory function fo{x, y) = cos (20 (x + y)) 

• a function with a derivative discontinuity at x = y, fs{x, y) = e"^'~^"^ — 1 

• a function fj^ belonging to the native space of the gaussian kernel, 
obtained as a linear combination of the kernel centered on some points 
in Q, for a fixed shape parameter. We use them to test the behaviour 
of the approximant for functions in the native space. 

4.1.4 Kernels 

We use three different kernels among the ones described in Table f ll.ip . 
The choice is motivated from the different behaviour of the eigenvalues {Aj}j>o 
of the integral operator T$ associated with these basis functions. Indeed, 
altough we know from Theorem (II. 5p that the continuous eigenvalues accu- 
mulate to zero, the speed in which they decay is clearly not the same for 
different kernels. 

The basis functions are the gaussian (fast decay to zero), the IMQ (slower 

decay) and the 3MAT (slow decay) . Nevertheless, we point out that also the 

choice of the shape parameter e strongly influences this speed. 

We can expect that this difference reflects on the approximation and on its 

stability. 

4.2 Comparison between the interpolant and 
the weighted least- squares approximant 

As pointed out in the Remark (13. Sp . we know that we can compute the 
weighted least-squares approximant as a truncation of the interpolant. 
This reduction increases the error residual as shown in Proposition ( I3.12p . but 
in the cases in which the smaller eigenvalues are under a certain tolerance, 
we can expect that a truncation does not affect too much the approxima- 
tion capability. Furthermore, altough Proposition (13.91) proves the stability 
of our basis, in some limit situations we can expect that the influence of 
the smallest eigenvalues produces numerical instability that cannot be com- 
pletely controlled. 

In the next example we compare the approximation error produced using the 
full interpolant and some reduced weighted least-squares approximant. We 
reconstruct the oscillatory function fo on the disk ^2 using the three kernels 
with e = 1,4,9, starting from 600 trigonometric gaussian centers, and then 
truncating the basis for M G {0, 20, . . . , 600}. 
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Figure 4.2: Eigenvalues for the gaussian (top left), the IMQ (top right) and 
the 3MAT (bottom) kernels for shape parameter e = 1,4,9, computed on 
the disk Q2 using the Nystrom method based on 50^ trigonometric Gauss- 
Legendre cubature points and weights. 
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Figure 4.3: RMS error for the reconstruction of fo on Q2 using Am[/] for 
different Ms and different shape parameters, using the gaussian kernel (top 
left), the IMQ (top right) and the 3MAT kernel (bottom). 



To measure the accuracy of the reproduction obtained with this process, we 
compute the root-mean-square errors (RMSE) on a uniform grid. Figure 
(14. 3 p shows the results obtained. 

The results reflect the consideration on the eigenvalues made in the last 
section. Indeed, we can see that for the 3MAT kernel the interpolant re- 
mains stable for each e, the last iterations for e = 1 apart, the IMQ becomes 
instable for 6 = 1, while the gaussian presents some instability also for e = A. 
In the instable cases, there is a clear gain using a truncated version of the 
interpolant, that is a least-squares approximant AAf[/] for some M. Table 
(14. 2 p shows the index M such that Aj\/[/] provides the best approximation 

of /o. 

A special situation occurs for the gaussian with e = 1, where the reducing 
process does not suffices to avoid instability, as expected from the distribu- 
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e = 1 


e = 4 


e = 9 


Gaussian 


100 


340 


500 


IMQ 


180 


580 


580 


3MAT 


460 


560 


580 



Table 4.1: Optimal Ms for different kernels and shape parameter, i.e. indexes 
such that the weighted least-squares approximant Am[/] provides the best 
approximation of the test function fo on ^2 . 

tion of the eigenvalues shown in top-left Figure (14. 2p : for this parameter the 
eigenvalues are almost all under the machine precision. Moreover, for e = 1 
the gaussian becomes too fiat, and then there is no hope to reconstruct an 
oscillatory function. 

4.3 Comparison with the standard basis 

In the following tests we compare the approximations obtained from the 
standard basis of translates and from our basis. 

Since acoording to the Section (11. Sp we know that the stability of the standard 
basis is strictly related to the shape parameter e, at first we compare the 
two methods for different fixed values of e and different kernels, considering 
situations in which the standard interpolant becomes seriously instable as 
well as more stable cases. Then we repeat the same tests for an optimized 
shape parameter e*. 

4.3.1 Fixed shape parameter 

In this example we try to reconstruct the Franke function fp on the lens 
using the IMQ kernel. 

The test compares the results obtained from the interpolant based on the 
standard basis centered on an uniform grid and the one based on our basis, 
centered on a trigonometric gauss set. The reconstruction is repeated for 
e = 1,4,9 and for data sites sets Xn C 1^4, with = \Xn\ < 1000. 
The RMS errors are reported in Figure (14.41) . 

We can see that in the stable case, namely for e = 9, there is only a small 
difference between the two basis, altough for > 500 the standard inter- 
polant does not gain accuracy. 

For e = 1,4, altough for small data sets Xn the two basis does not behave 
much different, when A^ becomes too big the standard basis becomes instable 
and a growing of the data-sites set does not lead to a more accurate recon- 
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Figure 4.4: RMS error for the reconstruction of fp on the lens using the 
IMQ kernel with the standard basis and our basis. Test for different shape 
parameter: e — 1 (top left), e — A (top right) and £ = 9 (bottom). 



46 Chapter 4: Numerical experiments 




100 200 300 400 500 600 700 800 900 1000 



Figure 4.5: RMS errors for the reconstruction of fp on the lens ^4 using the 
IMQ kernel and e = 1, using the interpolant based on the standard basis and 
the weighted least-squares approximant Aa/[/] with M such that cr^ < 10^^^ 

struction. On the other hand, the interpolant based on our basis presents 
a convergent behaviour for each shape parameter, even if it is also clearly 
influenced in the rate of convergence. 

This feature can be useful since, at least in the considered cases, there is no 
need to choose a particular e to guarantee convergence, even if slow. 

Furthermore, when a small shape parameter influences too much the sta- 
bility of the interpolant, we can use the reduced weighted last-squares ap- 
proximant AmIJ], as discussed in the previous section. The approximation 
process for £ = 1 is repeated using Ajv/ [/] instead of Px[f], with M such that 
o"Af < 10^^''. The result is shown in Figure (14. 5p . The aproximant is clearly 
more stable, while the convergence rate is not reduced. 

4.3.2 Optimized shape parameter 

A possible solution for the instability of the standar basis is to optimize the 
shape parameter. In practice, for a fixed point distribution, a fixed kernel 
and a fixed test function, one tries to find the parameter that minimizes the 
residual error. 

In the following examples we realize this optimization using the so-called 
leave one out method. The idea is to compute the interpolant Px[f] on 
the full set X C i7 and the interpolants P[f]i on the reduced sets Xi = 
X \ {xi\ V i G {1, . . . , A^}, for different shape parameter e G -E, -E C M, and 
then to choose the optimal e* defined as 

e* = argmin max \Px[f]{xi) - P[f]i{xi)\ 
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Figure 4.6: RMS error for the reconstruction of fx on the square fii using the 
gaussian kernel with the standard basis and an optimized shape parameter 
e* and our basis with e = 4. The standard interpolant is computed using 
equally spaced points (on the left) and Halton's points (on the right). Our 
basis is truncated at M such that aM < 10""^'' 



We remark that this optimization is quite expensive in terms of computa- 
tional time, and cannot be performed once for all, but has to be repeated if 
the data-sites set increases. Moreover, there are cases in which a particular 
choice of e is motivated by theoretical reason. 

To examine this situation, we use as a test function an element of the native 
space of the gaussian $4(0;,?/) := exp(— 4^||a; — ?/||2) on the square Qi, i.e. the 
function 



f^(x) = -2<l>4(x, (0.5, 0.5)) + <l>4(x, (0, 0)) + 3<l>4(x, (0.7, 0.7)) Vx G [0, 1]^ 



The RMS errors are plotted in Figure fl4.6p . using uniform points (on the 
left) and Halton points (on the right) as centers of the standard basis. It 
is clear that a good choice of the shape parameter reduces the instability of 
the standard interpolant, altough it does not suffices to avoid it completely. 
On the other hand, the stability of our basis, together with the truncation 
at M such that ctm < 10^^^, allows to use the "right" shape parameter for 
each number of centers, and this leads to an approximant that converges to 
the sampled function with a tolerance near to the machine precision. Table 
fl4.3.2p shows the RMS errors for different numbers of data sites, together 
with the optimal parameter e* selected by the leave-one-out optimization. 



48 



Chapter 4: Numerical experiments 



N 


196 


324 


529 


729 


900 


Std-e 

e* 


1.05 ■ 10-^ 
3.75 


5.23 ■ 10-1° 
3.81 


3.17-10-12 
3.84 


7.15-10-12 
3.87 


1.30 ■ 10-12 
3.98 


Std - H 

e* 


3.30 ■ 10-^ 
3.75 


9.31 • 10-9 
3.84 


7.12 ■ 10-11 
3.89 


1.56 ■ 10-11 
3.92 


1.59 ■ 10-12 
3.95 


W-Svd 


7.37- 10-s 


2.23 ■ 10-11 


3.48 ■ 10-15 


6.08-10-15 


6.37-10-15 



Table 4.2: RMS errors for the approximation described in Section (I4.3.2P 
obtained using our basis (W-Svd), the standard basis centered on equally 
spaced points (Std - e) and on the Halton points (Std - H), together with the 
optimal shape parameter used for the standard basis. Values for iV = |X| as 
in the first row. 

4.4 Comparison with the Newton basis 

In the paper P] the general change of basis described in the Chapter ([2]) 
was the starting point to create a Newton basis for the native space A/'$(i7). 
Among other properties partially enjoied also by our basis, the Newton ba- 
sis can be computed recursively, i.e. if we add to X a further sample point 
Xtv+i ^ AT it suffices to compute the basis element corresponding to the new 
point. Moreover, an adaptive point-selection algorithm is provided to choose 
the point to add using also the information given by the sampled values of 
the test function. 

Matlab programs for the adaptive calculation of the interpolant based on 
this basis can be found in |11] . 

To compare the two bases we try to reconstruct the function fs on the cutted 
disk Q3 using the gaussian kernel and a Wendland kernel W21. We choose to 
use the latter since it is already present in the mentioned Matlab functions 
and because it provides interesting results which are explained in the next. 
In both cases the adaptive algorithm is able to detect the derivative discon- 
tinuity at X = y, and to concentrate the data sites near this area. 

The test for the first kernel with e = A shows a much better behaviour 
of the Newton basis. Indeed, a small number of points suffices to introduce 
high instability in our basis. This can be a consequence of the distribution 
of the data-sites depicted in Figure fl4.ip . which are too concentrated in zero 
to produce a well-conditioned kernel matrix. To avoid this, we repeat the 
experiment using a weighted least-squares approximant Am[/] with the M 
that provides the best approximation, namely M such that ctm < 10-i°, and 
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Figure 4.7: Absolute error for the reconstruction of fg on the domain ^3 
using the Newton basis and our basis (dotted lines). The kernel used are the 
Wendland's W21 with e = 2 (first row), using different starting grid for the 
Newton basis, and the gaussian with e = 4 in the second row (the error for 
the full interpolant is cutted from the plot after it becomes bigger than 10^). 

in this case the residual decreases as grows. The maximal absolute erros 
for ^ 625 are depicted in the bottom of Figure (14. 7|) . 

The second kernel is used with e = 2, and in this case our basis and the 
Newton basis provides an almost equivalent decrease of the maximal abso- 
lute error in the range under consideration, as reported in the top left of 
Figure fHTjl . 

Here the Newton basis behaves in a quite unexpected way, since if the uni- 
form grid from which the data-sites are selected by the algorithm is reduced, 
namely if the one dimensional grid size changes from A^; = 0.01 to A^; = 0.05, 
the accuracy of the interpolant becomes strictly better, as depicted in the 
top right part of Figure fl4.7p . 
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This thesis presents a way to construct a new kind of stable basis for the 
RBF approximation. 

The basis is derived from the procedure described in Chapter ([2]), and is 
build to have useful properties related to its $-ort honor mality. 
Furthermore, the particular approach used here connects this discrete basis 
with the "natural" one described in the Theorem (II. Sp . and this allows to 
relate some functional property of the kernel to the approximant itself. 

In this setting, a more deep study could lead to a stronger use of the informa- 
tion provided by the kernel and the domain. In particular the convergence 
estimate of Proposition (13. lip can be refined considering the rate of conver- 
gence to zero of the eigenvalues of the operator T$ and the property and the 
convergence rate of the Nystrom method based on the setting of the problem, 
namely the choosen cubature rule, the kernel $, the shape parameter e and 
the set Vt. 

As regards stability, the experiments presented in Chapter confirms the 
good behaviour expected from Proposition (13. 9p . In particular our basis al- 
lows to treat approximants based on a relatively big number of points also 
for not optimized shape parameters and on quite general sets. This feature 
can be enforced thanks to the possibility to compute a weighted least-squares 
approximant simply truncating the interpolant. From a numerical point of 
view this procedure can be accomplished without thinning the data-sites set 
X G Vt, but simply checking if the singular values decay under a certain 
tolerance. This corresponds to solve the linear sistem related to the kernel 
matrix with a (weighted) total least-squares algorithm. 

The dependence of the basis on a singular value decomposition does not 
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allow to produce an adaptive algorithm, but forces to compute a full factor- 
ization of the matrix for each fixed points distribution. In this sense, it would 
be interesting to adapt our method to work for approximation based on com- 
pactly supported kernels. Indeed, altough it is possible to use them as any 
other kernel as done in Section (14. 4p . a more specific implementation could 
benefit from the compact support and hence produce sparse kernel matrices. 
In this setting there are eigenvalue algorithms optimized for finding only a 
small subset of the full spectrum of a matrix, and then it would be possible to 
compute an approximant based only on eigenvalues upon a certain tolerance. 
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